Structural Maximum a Posteriori Adaptation for Mixture Stochastic Trajectory Framework
نویسندگان
چکیده
In this paper we address the problem of the adaptation of a speech recognition system to a new environment. The aim of adaptation is to compensate the mismatch between training and testing conditions without retraining completely the recognition system. The questions are what has to be compensated and how? We propose to compensate the means and variances of the Gaussian pdfs, representing the acoustic models, using the linear transformations and ML and MAP estimations. To better take into account the variability of the adaptation data, the pdfs of models are organised in a tree. This tree structure is used also for the definition of prior densities of transformations. The approach is called Structural Maximum a Posteriori adaptation (SMAP). SMAP is developed for a segment-based model, the Mixture Stochastic Trajectory Model (MSTM). Experimental results on RM task for supervised speaker adaptation show that SMAP significantly outperforms the MLLR adaptation for the same amount of adaptation data and the same number of transformation parameters.
منابع مشابه
Tree-structured Maximum a Po for a Segment-based Speech R
In this paper, the problem of the adaptation of a speech recognition system to a new environment is addressed. Recently, a Structural Maximum a Posteriori adaptation (SMAP) for a frame-based HMM model adaptation has been developed. In this method, acoustic model pdfs are organised in a tree and the means and variances of the pdfs are adapted using the linear transformations estimated under MAP ...
متن کاملMaximum a posteriori adaptation for many-to-one eigenvoice conversion
Many-to-one eigenvoice conversion (EVC) allows the conversion from an arbitrary speaker’s voice into the pre-determined target speaker’s voice. In this method, a canonical eigenvoice Gaussian mixture model is effectively adapted to any source speaker using only a few utterances as the adaptation data. In this paper, we propose a many-to-one EVC based on maximum a posteriori (MAP) adaptation for...
متن کاملModeling Long Term Variability Information in Mixture Stochastic Trajectory Framework
The problem of acoustic modeling for speech recognizers is addressed. We distinguish two types of speech variability, long term (speaker identity, stationary noise, channel distortion) and short term (phoneme class). Currently, most recognizers model the two variabilities without considering their specificities, which may result in flat distributions with limited discriminability. In our system...
متن کاملModelling long term variability information in mixture stochastic trajectory framework
The problem of acoustic modeling for speech recognizers is addressed. We distinguish two types of speech variability, long term (speaker identity, stationary noise, channel distortion) and short term (phoneme class). Currently, most recognizers model the two variabilities without considering their specificities, which may result in flat distributions with limited discriminability. In our system...
متن کاملEffect of Relevance Factor of Maximum a posteriori Adaptation for GMM-SVM in Speaker and Language Recognition
Gaussian mixture model support vector machine (GMMSVM) with nuisance attribute projection (NAP) has been found to be effective and reliable for speaker and language recognition. In maximum a posteriori (MAP) adaptation of GMM, the relevance factor is the parameter that regulates how much the adaptation data affect the base model, which impacts the final recognition performance. In our previous ...
متن کامل